10. Practical Advice for K-Fold in sklearn
Practical Advice for K-Fold in sklearn
If our original data comes in some sort of sorted fashion, then we will want to first shuffle the order of the data points before splitting them up into folds, or otherwise randomly assign data points to each fold. If we want to do this using
KFold()
, then we can add the "shuffle = True" parameter when setting up the cross-validation object.
If we have concerns about class imbalance, then we can use the
StratifiedKFold()
class instead. Where
KFold()
assigns points to folds without attention to output class,
StratifiedKFold()
assigns data points to folds so that each fold has approximately the same number of data points of each output class. This is most useful for when we have imbalanced numbers of data points in your outcome classes (e.g. one is rare compared to the others). For this class as well, we can use "shuffle = True" to shuffle the data points' order before splitting into folds.